In This dataset we are going to create a model on predicting the heart failure.
Importing libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.simplefilter("ignore")
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
heart=pd.read_csv("heart_failure.csv")
heart.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 299 entries, 0 to 298 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 age 299 non-null float64 1 anaemia 299 non-null int64 2 creatinine_phosphokinase 299 non-null int64 3 diabetes 299 non-null int64 4 ejection_fraction 299 non-null int64 5 high_blood_pressure 299 non-null int64 6 platelets 299 non-null float64 7 serum_creatinine 299 non-null float64 8 serum_sodium 299 non-null int64 9 sex 299 non-null int64 10 smoking 299 non-null int64 11 time 299 non-null int64 12 DEATH_EVENT 299 non-null int64 dtypes: float64(3), int64(10) memory usage: 30.5 KB
heart.describe()
| age | anaemia | creatinine_phosphokinase | diabetes | ejection_fraction | high_blood_pressure | platelets | serum_creatinine | serum_sodium | sex | smoking | time | DEATH_EVENT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| count | 299.000000 | 299.000000 | 299.000000 | 299.000000 | 299.000000 | 299.000000 | 299.000000 | 299.00000 | 299.000000 | 299.000000 | 299.00000 | 299.000000 | 299.00000 |
| mean | 60.833893 | 0.431438 | 581.839465 | 0.418060 | 38.083612 | 0.351171 | 263358.029264 | 1.39388 | 136.625418 | 0.648829 | 0.32107 | 130.260870 | 0.32107 |
| std | 11.894809 | 0.496107 | 970.287881 | 0.494067 | 11.834841 | 0.478136 | 97804.236869 | 1.03451 | 4.412477 | 0.478136 | 0.46767 | 77.614208 | 0.46767 |
| min | 40.000000 | 0.000000 | 23.000000 | 0.000000 | 14.000000 | 0.000000 | 25100.000000 | 0.50000 | 113.000000 | 0.000000 | 0.00000 | 4.000000 | 0.00000 |
| 25% | 51.000000 | 0.000000 | 116.500000 | 0.000000 | 30.000000 | 0.000000 | 212500.000000 | 0.90000 | 134.000000 | 0.000000 | 0.00000 | 73.000000 | 0.00000 |
| 50% | 60.000000 | 0.000000 | 250.000000 | 0.000000 | 38.000000 | 0.000000 | 262000.000000 | 1.10000 | 137.000000 | 1.000000 | 0.00000 | 115.000000 | 0.00000 |
| 75% | 70.000000 | 1.000000 | 582.000000 | 1.000000 | 45.000000 | 1.000000 | 303500.000000 | 1.40000 | 140.000000 | 1.000000 | 1.00000 | 203.000000 | 1.00000 |
| max | 95.000000 | 1.000000 | 7861.000000 | 1.000000 | 80.000000 | 1.000000 | 850000.000000 | 9.40000 | 148.000000 | 1.000000 | 1.00000 | 285.000000 | 1.00000 |
heart.head(5)
| age | anaemia | creatinine_phosphokinase | diabetes | ejection_fraction | high_blood_pressure | platelets | serum_creatinine | serum_sodium | sex | smoking | time | DEATH_EVENT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 75.0 | 0 | 582 | 0 | 20 | 1 | 265000.00 | 1.9 | 130 | 1 | 0 | 4 | 1 |
| 1 | 55.0 | 0 | 7861 | 0 | 38 | 0 | 263358.03 | 1.1 | 136 | 1 | 0 | 6 | 1 |
| 2 | 65.0 | 0 | 146 | 0 | 20 | 0 | 162000.00 | 1.3 | 129 | 1 | 1 | 7 | 1 |
| 3 | 50.0 | 1 | 111 | 0 | 20 | 0 | 210000.00 | 1.9 | 137 | 1 | 0 | 7 | 1 |
| 4 | 65.0 | 1 | 160 | 1 | 20 | 0 | 327000.00 | 2.7 | 116 | 0 | 0 | 8 | 1 |
Sex - Gender of patient Male = 1, Female =0 Age - Age of patient Diabetes - 0 = No, 1 = Yes Anaemia - 0 = No, 1 = Yes High_blood_pressure - 0 = No, 1 = Yes Smoking - 0 = No, 1 = Yes DEATH_EVENT - 0 = No, 1 = Yes
Now we are going to finding the null values in the data set
heart.isnull().sum()
age 0 anaemia 0 creatinine_phosphokinase 0 diabetes 0 ejection_fraction 0 high_blood_pressure 0 platelets 0 serum_creatinine 0 serum_sodium 0 sex 0 smoking 0 time 0 DEATH_EVENT 0 dtype: int64
sns.heatmap(heart.isnull(),cmap="Greens")
<AxesSubplot:>
heart["platelets"]=heart["platelets"].astype(int)
heart["age"]=heart["age"].astype(int)
heart["anaemia"]=heart["anaemia"].astype(int)
heart["platelets"]=heart["platelets"].astype(int)
heart
| age | anaemia | creatinine_phosphokinase | diabetes | ejection_fraction | high_blood_pressure | platelets | serum_creatinine | serum_sodium | sex | smoking | time | DEATH_EVENT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 75 | 0 | 582 | 0 | 20 | 1 | 265000 | 1.9 | 130 | 1 | 0 | 4 | 1 |
| 1 | 55 | 0 | 7861 | 0 | 38 | 0 | 263358 | 1.1 | 136 | 1 | 0 | 6 | 1 |
| 2 | 65 | 0 | 146 | 0 | 20 | 0 | 162000 | 1.3 | 129 | 1 | 1 | 7 | 1 |
| 3 | 50 | 1 | 111 | 0 | 20 | 0 | 210000 | 1.9 | 137 | 1 | 0 | 7 | 1 |
| 4 | 65 | 1 | 160 | 1 | 20 | 0 | 327000 | 2.7 | 116 | 0 | 0 | 8 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 294 | 62 | 0 | 61 | 1 | 38 | 1 | 155000 | 1.1 | 143 | 1 | 1 | 270 | 0 |
| 295 | 55 | 0 | 1820 | 0 | 38 | 0 | 270000 | 1.2 | 139 | 0 | 0 | 271 | 0 |
| 296 | 45 | 0 | 2060 | 1 | 60 | 0 | 742000 | 0.8 | 138 | 0 | 0 | 278 | 0 |
| 297 | 45 | 0 | 2413 | 0 | 38 | 0 | 140000 | 1.4 | 140 | 1 | 1 | 280 | 0 |
| 298 | 50 | 0 | 196 | 0 | 45 | 0 | 395000 | 1.6 | 136 | 1 | 1 | 285 | 0 |
299 rows × 13 columns
we notice that there is no null in the dataset and no categorial values
Now we use the visualization techniques to visualize our data and further fitting into models
print("HEART FAILED:",heart.DEATH_EVENT.value_counts()[1])
print("HEART NOT FAILED:",heart.DEATH_EVENT.value_counts()[0])
HEART FAILED: 96 HEART NOT FAILED: 203
fig, ax = plt.subplots(figsize=(6,6))
plt.pie(x=heart["sex"].value_counts(),colors=["violet","seagreen"],explode = ( 0.1,0.1),labels=["MALE","FEMALE"],shadow = True)
([<matplotlib.patches.Wedge at 0x281cf428400>, <matplotlib.patches.Wedge at 0x281cf428b20>], [Text(-0.5408530634991104, 1.071203978569734, 'MALE'), Text(0.5408531637924622, -1.0712039279314114, 'FEMALE')])
fig, ax = plt.subplots(figsize=(6,6))
plt.pie(x=heart["DEATH_EVENT"].value_counts(),colors=["green","red"],explode = ( 0.1,0.1),labels=["Heart not failed"," Heart failed"],startangle=90,shadow = True)
([<matplotlib.patches.Wedge at 0x281cf472610>, <matplotlib.patches.Wedge at 0x281cf472d30>], [Text(-1.0153497129885487, -0.639581863668813, 'Heart not failed'), Text(1.0153497728705203, 0.6395817686049088, ' Heart failed')])
sns.countplot(x='DEATH_EVENT',data=heart,palette='rocket_r')
plt.show()
plt.subplot(1,2,1)
sns.countplot(x='DEATH_EVENT',data=heart,palette='rocket_r')
plt.subplot(1,2,2)
sns.countplot(x='DEATH_EVENT',hue="sex",data=heart,palette='viridis')
<AxesSubplot:xlabel='DEATH_EVENT', ylabel='count'>
plt.subplot(1,2,1)
sns.countplot(x='smoking',hue="DEATH_EVENT",data=heart,palette="inferno_r")
plt.subplot(1,2,2)
sns.countplot(x='diabetes',hue="DEATH_EVENT",data=heart,palette='viridis')
<AxesSubplot:xlabel='diabetes', ylabel='count'>
plt.subplot(1,2,1)
sns.countplot(x='high_blood_pressure',hue="DEATH_EVENT",data=heart,palette='seismic_r')
plt.subplot(1,2,2)
sns.countplot(x='anaemia',hue="DEATH_EVENT",data=heart,palette='gist_rainbow')
<AxesSubplot:xlabel='anaemia', ylabel='count'>
plt.figure(figsize=(7,5))
plt.subplot(1,2,1)
sns.countplot('smoking',hue='diabetes',data=heart)
plt.subplot(1,2,2)
sns.countplot('high_blood_pressure',hue='sex',data=heart)
<AxesSubplot:xlabel='high_blood_pressure', ylabel='count'>
bins=[10,20,30,40,50,60,70,80,90,100]
plt.hist(heart.time,bins=bins,color='#1aadad')
plt.xticks(bins)
plt.xlabel("BINS")
plt.ylabel("TIME")
plt.show
<function matplotlib.pyplot.show(close=None, block=None)>
d1 = heart[(heart["DEATH_EVENT"]==0) & (heart["sex"]==1)]
d2 = heart[(heart["DEATH_EVENT"]==1) & (heart["sex"]==1)]
d3 = heart[(heart["DEATH_EVENT"]==0) & (heart["sex"]==0)]
d4 = heart[(heart["DEATH_EVENT"]==1) & (heart["sex"]==0)]
label1 = ["Male","Female"]
label2 = ['Male - Survived','Male - Died', "Female - Survived", "Female - Died"]
values1 = [(len(d1)+len(d2)), (len(d3)+len(d4))]
values2 = [len(d1),len(d2),len(d3),len(d4)]
# Create subplots: use 'domain' type for Pie subplot
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=label1, values=values1, name="GENDER"),
1, 1)
fig.add_trace(go.Pie(labels=label2, values=values2, name="GENDER VS DEATH_EVENT"),
1, 2)
# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.4, hoverinfo="label+percent")
fig.update_layout(
title_text="GENDER DISTRIBUTION IN THE DATASET \
GENDER VS DEATH_EVENT",
# Add annotations in the center of the donut pies.
annotations=[dict(text='GENDER', x=0.19, y=0.5, font_size=10, showarrow=False),
dict(text='GENDER VS DEATH_EVENT', x=0.84, y=0.5, font_size=9, showarrow=False)],
autosize=False,width=900, height=400, paper_bgcolor="white")
fig.show()
d1 =heart[(heart["DEATH_EVENT"]==0) & (heart["diabetes"]==0)]
d2 = heart[(heart["DEATH_EVENT"]==0) & (heart["diabetes"]==1)]
d3 = heart[(heart["DEATH_EVENT"]==1) & (heart["diabetes"]==0)]
d4 = heart[(heart["DEATH_EVENT"]==1) & (heart["diabetes"]==1)]
label1 = ["No Diabetes","Diabetes"]
label2 = ['No Diabetes - Survived','Diabetes - Survived', "No Diabetes - Died", "Diabetes - Died"]
values1 = [(len(d1)+len(d3)), (len(d2)+len(d4))]
values2 = [len(d1),len(d2),len(d3),len(d4)]
# Create subplots: use 'domain' type for Pie subplot
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=label1, values=values1, name="DIABETES"),
1, 1)
fig.add_trace(go.Pie(labels=label2, values=values2, name="DIABETES VS DEATH_EVENT"),
1, 2)
# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.4, hoverinfo="label+percent")
fig.update_layout(
title_text="DIABETES DISTRIBUTION IN THE DATASET \
DIABETES VS DEATH_EVENT",
# Add annotations in the center of the donut pies.
annotations=[dict(text='DIABETES', x=0.20, y=0.5, font_size=10, showarrow=False),
dict(text='DIABETES VS DEATH_EVENT', x=0.84, y=0.5, font_size=8, showarrow=False)],
autosize=False,width=900, height=400, paper_bgcolor="white")
fig.show()
d1 = heart[(heart["DEATH_EVENT"]==0) & (heart["anaemia"]==0)]
d2 = heart[(heart["DEATH_EVENT"]==1) & (heart["anaemia"]==0)]
d3 = heart[(heart["DEATH_EVENT"]==0) & (heart["anaemia"]==1)]
d4 = heart[(heart["DEATH_EVENT"]==1) & (heart["anaemia"]==1)]
label1 = ["No Anaemia","Anaemia"]
label2 = ['No Anaemia - Survived','No Anaemia - Died', "Anaemia - Survived", "Anaemia - Died"]
values1 = [(len(d1)+len(d2)), (len(d3)+len(d4))]
values2 = [len(d1),len(d2),len(d3),len(d4)]
# Create subplots: use 'domain' type for Pie subplot
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=label1, values=values1, name="ANAEMIA"),
1, 1)
fig.add_trace(go.Pie(labels=label2, values=values2, name="ANAEMIA VS DEATH_EVENT"),
1, 2)
# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.4, hoverinfo="label+percent")
fig.update_layout(
title_text="ANAEMIA DISTRIBUTION IN THE DATASET \
ANAEMIA VS DEATH_EVENT",
# Add annotations in the center of the donut pies.
annotations=[dict(text='ANAEMIA', x=0.20, y=0.5, font_size=10, showarrow=False),
dict(text='ANAEMIA VS DEATH_EVENT', x=0.84, y=0.5, font_size=8, showarrow=False)],
autosize=False,width=900, height=400, paper_bgcolor="white")
fig.show()
d1 = heart[(heart["DEATH_EVENT"]==0) & (heart["anaemia"]==0)]
d2 = heart[(heart["DEATH_EVENT"]==1) & (heart["anaemia"]==0)]
d3 = heart[(heart["DEATH_EVENT"]==0) & (heart["anaemia"]==1)]
d4 = heart[(heart["DEATH_EVENT"]==1) & (heart["anaemia"]==1)]
label1 = ["No Anaemia","Anaemia"]
label2 = ['No Anaemia - Survived','No Anaemia - Died', "Anaemia - Survived", "Anaemia - Died"]
values1 = [(len(d1)+len(d2)), (len(d3)+len(d4))]
values2 = [len(d1),len(d2),len(d3),len(d4)]
# Create subplots: use 'domain' type for Pie subplot
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=label1, values=values1, name="ANAEMIA"),
1, 1)
fig.add_trace(go.Pie(labels=label2, values=values2, name="ANAEMIA VS DEATH_EVENT"),
1, 2)
# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.4, hoverinfo="label+percent")
fig.update_layout(
title_text="ANAEMIA DISTRIBUTION IN THE DATASET \
ANAEMIA VS DEATH_EVENT",
# Add annotations in the center of the donut pies.
annotations=[dict(text='ANAEMIA', x=0.20, y=0.5, font_size=10, showarrow=False),
dict(text='ANAEMIA VS DEATH_EVENT', x=0.84, y=0.5, font_size=8, showarrow=False)],
autosize=False,width=900, height=400, paper_bgcolor="white")
fig.show()
d1 = heart[(heart["DEATH_EVENT"]==0) & (heart["smoking"]==0)]
d2 = heart[(heart["DEATH_EVENT"]==1) & (heart["smoking"]==0)]
d3 = heart[(heart["DEATH_EVENT"]==0) & (heart["smoking"]==1)]
d4 = heart[(heart["DEATH_EVENT"]==1) & (heart["smoking"]==1)]
label1 = ["No Smoking","Smoking"]
label2 = ['No Smoking - Survived','No Smoking - Died', "Smoking - Survived", "Smoking - Died"]
values1 = [(len(d1)+len(d2)), (len(d3)+len(d4))]
values2 = [len(d1),len(d2),len(d3),len(d4)]
# Create subplots: use 'domain' type for Pie subplot
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=label1, values=values1, name="SMOKING"),
1, 1)
fig.add_trace(go.Pie(labels=label2, values=values2, name="SMOKING VS DEATH_EVENT"),
1, 2)
# Use `hole` to create a donut-like pie chart
fig.update_traces(hole=.4, hoverinfo="label+percent")
fig.update_layout(
title_text="SMOKING DISTRIBUTION IN THE DATASET \
SMOKING VS DEATH_EVENT",
# Add annotations in the center of the donut pies.
annotations=[dict(text='SMOKING', x=0.20, y=0.5, font_size=10, showarrow=False),
dict(text='SMOKING VS DEATH_EVENT', x=0.84, y=0.5, font_size=8, showarrow=False)],
autosize=False,width=900, height=400, paper_bgcolor="white")
fig.show()
plt.axvline(0,c=(.5,.5,.5), ls='--')
plt.axhline(0,c=(.5,.5,.5), ls='--')
plt.style.use('seaborn')
plt.scatter(heart.age,heart.time,c=heart.time , cmap="gist_rainbow",edgecolor='k');
plt.colorbar();
fig, ax = plt.subplots(figsize=(8,6))
sns.heatmap(heart.corr(), annot=True, fmt='.1g', cmap='viridis');
fig, ax = plt.subplots(figsize=(14,8))
sns.histplot(x=heart["age"], kde=True, color='red')
<AxesSubplot:xlabel='age', ylabel='Count'>
healthy = 'heart not failed'
unhealthy = 'heart failed'
fig, axes = plt.subplots(nrows=1, ncols=2,figsize=(10, 4))
women = heart[heart['sex']==0]
men = heart[heart['sex']==1]
ax = sns.distplot(women[women['DEATH_EVENT']==1].age.dropna(), bins=18, label = unhealthy, ax = axes[0], kde =False)
ax = sns.distplot(women[women['DEATH_EVENT']==0].age.dropna(), bins=40, label =healthy, ax = axes[0], kde =False)
ax.legend()
ax.set_title('Female')
ax = sns.distplot(men[men['DEATH_EVENT']==1].age.dropna(), bins=18, label = unhealthy, ax = axes[1], kde = False)
ax = sns.distplot(men[men['DEATH_EVENT']==0].age.dropna(), bins=40, label =healthy, ax = axes[1], kde = False)
ax.legend()
ax.set_title('Male')
Text(0.5, 1.0, 'Male')
NOW BUILD THE MACHINE LEARNING MODLES
X = heart.drop("DEATH_EVENT", axis=1)
X.head()
| age | anaemia | creatinine_phosphokinase | diabetes | ejection_fraction | high_blood_pressure | platelets | serum_creatinine | serum_sodium | sex | smoking | time | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 75 | 0 | 582 | 0 | 20 | 1 | 265000 | 1.9 | 130 | 1 | 0 | 4 |
| 1 | 55 | 0 | 7861 | 0 | 38 | 0 | 263358 | 1.1 | 136 | 1 | 0 | 6 |
| 2 | 65 | 0 | 146 | 0 | 20 | 0 | 162000 | 1.3 | 129 | 1 | 1 | 7 |
| 3 | 50 | 1 | 111 | 0 | 20 | 0 | 210000 | 1.9 | 137 | 1 | 0 | 7 |
| 4 | 65 | 1 | 160 | 1 | 20 | 0 | 327000 | 2.7 | 116 | 0 | 0 | 8 |
Y = heart["DEATH_EVENT"]
Y.head()
0 1 1 1 2 1 3 1 4 1 Name: DEATH_EVENT, dtype: int64
Spliting the data
X_train, X_test, y_train, y_test = train_test_split(X, Y,train_size=0.8,test_size=0.2, random_state=1)
len(X_train), len(X_test),len(y_train), len(y_test)
(239, 60, 239, 60)
logr = LogisticRegression()
logr.fit(X_train, y_train)
LogisticRegression()
Y_pred = logr.predict(X_test)
LogisticRegressionScore = logr.score(X_test, y_test)
cf = confusion_matrix(y_test, Y_pred)
cf
array([[44, 2],
[ 4, 10]], dtype=int64)
sns.heatmap(cf, annot=True, cmap='seismic_r')
plt.title("Confusion Matrix for Logistic Regression", fontsize=16, y=1)
Text(0.5, 1, 'Confusion Matrix for Logistic Regression')
print(classification_report(y_test,Y_pred))
precision recall f1-score support
0 0.92 0.96 0.94 46
1 0.83 0.71 0.77 14
accuracy 0.90 60
macro avg 0.88 0.84 0.85 60
weighted avg 0.90 0.90 0.90 60
acc_log = round(logr.score(X_train, y_train) * 100,4)
print(acc_log)
79.0795
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, y_train)
RandomForestClassifier()
Y_prediction = random_forest.predict(X_test)
random_forest.score(X_train, y_train)
1.0
cf = confusion_matrix(y_test, Y_prediction)
cf
array([[46, 0],
[ 3, 11]], dtype=int64)
sns.heatmap(cf, annot=True, cmap='viridis')
plt.title("Confusion Matrix for Random Forest", fontsize=16, y=1)
Text(0.5, 1, 'Confusion Matrix for Random Forest')
print(classification_report(y_test,Y_prediction))
precision recall f1-score support
0 0.94 1.00 0.97 46
1 1.00 0.79 0.88 14
accuracy 0.95 60
macro avg 0.97 0.89 0.92 60
weighted avg 0.95 0.95 0.95 60
acc_forest = round(random_forest.score(X_train, y_train) * 100,4)
print(acc_forest)
100.0
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train,y_train)
DecisionTreeClassifier()
Y_predictions =decision_tree.predict(X_test)
decision_tree.score(X_train, y_train)
1.0
cf = confusion_matrix(y_test, Y_predictions)
cf
array([[37, 9],
[ 3, 11]], dtype=int64)
sns.heatmap(cf, annot=True,cmap="magma")
plt.title("Confusion Matrix for Decision tree", fontsize=16, y=1)
Text(0.5, 1, 'Confusion Matrix for Decision tree')
print(classification_report(y_test,Y_predictions))
precision recall f1-score support
0 0.93 0.80 0.86 46
1 0.55 0.79 0.65 14
accuracy 0.80 60
macro avg 0.74 0.80 0.75 60
weighted avg 0.84 0.80 0.81 60
acc_decision = round(random_forest.score(X_train, y_train) * 100,4)
print(acc_decision)
100.0
In these three models let us see which one is the best model for this dataset
results = pd.DataFrame({
'Model': ['Logistic Regression',
'Random Forest',
'Decision Tree'],
'Score': [ acc_log,
acc_forest,
acc_decision]})
result_df = results.sort_values(by='Score', ascending=False)
result_df = result_df.set_index('Score')
result_df.head(9)
| Model | |
|---|---|
| Score | |
| 100.0000 | Random Forest |
| 100.0000 | Decision Tree |
| 79.0795 | Logistic Regression |
In the above table we observed that Random Forest and Decision Tree are more accurate than LogisticRegression
In the above observations Random Forest and Decision Tree got 100% accuracy
Let validate our Random Forest
rf = RandomForestClassifier(n_estimators=100)
scores = cross_val_score(rf, X_train, y_train, cv=10, scoring = "accuracy")
print("Scores:", scores)
print("Mean:", scores.mean())
print("Standard Deviation:", scores.std())
Scores: [0.79166667 0.75 0.875 0.875 0.875 0.79166667 0.83333333 0.70833333 0.875 0.86956522] Mean: 0.8244565217391304 Standard Deviation: 0.05788858496518257
In our model Random Forest has average accuracy 83% with a standard deviation 5%
df = DecisionTreeClassifier()
scores = cross_val_score(df, X_train,y_train, cv=10, scoring = "accuracy")
print("Scores:", scores)
print("Mean:", scores.mean())
print("Standard Deviation:", scores.std())
Scores: [0.83333333 0.79166667 0.75 0.75 0.75 0.75 0.70833333 0.66666667 0.79166667 0.86956522] Mean: 0.7661231884057972 Standard Deviation: 0.055491883540175306
In our model Decision Tree has average accuracy 75% with a standard deviation 6%
We select the dataset about heart failure .Firstly we import all the required libraries and then we perform th data cleaning process by checking the null values after that we tried to visualize our data with some visualization techniques for better undestanding of the dataset after that we create models for predicting the heart failure we create model by using logisticregression ,random forest and decision tree.we got 100% accuracy in random forest and decision tree and logistic regression has 80% accuracy and then we apply cross validation on random forest and decision tree.